Overview

Dataset statistics

Number of variables24
Number of observations30000
Missing cells0
Missing cells (%)0.0%
Duplicate rows35
Duplicate rows (%)0.1%
Total size in memory5.5 MiB
Average record size in memory192.0 B

Variable types

Numeric21
Categorical3

Alerts

Dataset has 35 (0.1%) duplicate rowsDuplicates
PAY_0 is highly correlated with PAY_2 and 2 other fieldsHigh correlation
PAY_2 is highly correlated with PAY_0 and 7 other fieldsHigh correlation
PAY_3 is highly correlated with PAY_0 and 9 other fieldsHigh correlation
PAY_4 is highly correlated with PAY_0 and 10 other fieldsHigh correlation
PAY_5 is highly correlated with PAY_2 and 8 other fieldsHigh correlation
PAY_6 is highly correlated with PAY_2 and 8 other fieldsHigh correlation
BILL_AMT1 is highly correlated with PAY_2 and 8 other fieldsHigh correlation
BILL_AMT2 is highly correlated with PAY_2 and 10 other fieldsHigh correlation
BILL_AMT3 is highly correlated with PAY_2 and 11 other fieldsHigh correlation
BILL_AMT4 is highly correlated with PAY_3 and 13 other fieldsHigh correlation
BILL_AMT5 is highly correlated with PAY_3 and 13 other fieldsHigh correlation
BILL_AMT6 is highly correlated with PAY_4 and 11 other fieldsHigh correlation
PAY_AMT1 is highly correlated with BILL_AMT1 and 5 other fieldsHigh correlation
PAY_AMT2 is highly correlated with BILL_AMT3 and 5 other fieldsHigh correlation
PAY_AMT3 is highly correlated with BILL_AMT4 and 7 other fieldsHigh correlation
PAY_AMT4 is highly correlated with BILL_AMT4 and 6 other fieldsHigh correlation
PAY_AMT5 is highly correlated with BILL_AMT4 and 5 other fieldsHigh correlation
PAY_AMT6 is highly correlated with BILL_AMT5 and 4 other fieldsHigh correlation
PAY_0 is highly correlated with PAY_2 and 3 other fieldsHigh correlation
PAY_2 is highly correlated with PAY_0 and 4 other fieldsHigh correlation
PAY_3 is highly correlated with PAY_0 and 4 other fieldsHigh correlation
PAY_4 is highly correlated with PAY_0 and 4 other fieldsHigh correlation
PAY_5 is highly correlated with PAY_0 and 4 other fieldsHigh correlation
PAY_6 is highly correlated with PAY_2 and 3 other fieldsHigh correlation
BILL_AMT1 is highly correlated with BILL_AMT2 and 4 other fieldsHigh correlation
BILL_AMT2 is highly correlated with BILL_AMT1 and 4 other fieldsHigh correlation
BILL_AMT3 is highly correlated with BILL_AMT1 and 4 other fieldsHigh correlation
BILL_AMT4 is highly correlated with BILL_AMT1 and 4 other fieldsHigh correlation
BILL_AMT5 is highly correlated with BILL_AMT1 and 4 other fieldsHigh correlation
BILL_AMT6 is highly correlated with BILL_AMT1 and 4 other fieldsHigh correlation
PAY_0 is highly correlated with PAY_2 and 1 other fieldsHigh correlation
PAY_2 is highly correlated with PAY_0 and 4 other fieldsHigh correlation
PAY_3 is highly correlated with PAY_0 and 4 other fieldsHigh correlation
PAY_4 is highly correlated with PAY_2 and 3 other fieldsHigh correlation
PAY_5 is highly correlated with PAY_2 and 4 other fieldsHigh correlation
PAY_6 is highly correlated with PAY_2 and 5 other fieldsHigh correlation
BILL_AMT1 is highly correlated with BILL_AMT2 and 4 other fieldsHigh correlation
BILL_AMT2 is highly correlated with BILL_AMT1 and 5 other fieldsHigh correlation
BILL_AMT3 is highly correlated with BILL_AMT1 and 5 other fieldsHigh correlation
BILL_AMT4 is highly correlated with PAY_5 and 5 other fieldsHigh correlation
BILL_AMT5 is highly correlated with PAY_6 and 6 other fieldsHigh correlation
BILL_AMT6 is highly correlated with PAY_6 and 6 other fieldsHigh correlation
PAY_AMT1 is highly correlated with BILL_AMT2High correlation
PAY_AMT2 is highly correlated with BILL_AMT3High correlation
PAY_AMT4 is highly correlated with BILL_AMT5High correlation
PAY_AMT5 is highly correlated with BILL_AMT6High correlation
LIMIT_BAL is highly correlated with BILL_AMT1 and 5 other fieldsHigh correlation
PAY_0 is highly correlated with PAY_2 and 5 other fieldsHigh correlation
PAY_2 is highly correlated with PAY_0 and 4 other fieldsHigh correlation
PAY_3 is highly correlated with PAY_0 and 4 other fieldsHigh correlation
PAY_4 is highly correlated with PAY_0 and 4 other fieldsHigh correlation
PAY_5 is highly correlated with PAY_0 and 5 other fieldsHigh correlation
PAY_6 is highly correlated with PAY_0 and 5 other fieldsHigh correlation
BILL_AMT1 is highly correlated with LIMIT_BAL and 6 other fieldsHigh correlation
BILL_AMT2 is highly correlated with LIMIT_BAL and 6 other fieldsHigh correlation
BILL_AMT3 is highly correlated with BILL_AMT1 and 6 other fieldsHigh correlation
BILL_AMT4 is highly correlated with LIMIT_BAL and 6 other fieldsHigh correlation
BILL_AMT5 is highly correlated with LIMIT_BAL and 8 other fieldsHigh correlation
BILL_AMT6 is highly correlated with LIMIT_BAL and 6 other fieldsHigh correlation
PAY_AMT1 is highly correlated with PAY_AMT2 and 2 other fieldsHigh correlation
PAY_AMT2 is highly correlated with BILL_AMT3 and 3 other fieldsHigh correlation
PAY_AMT3 is highly correlated with LIMIT_BAL and 8 other fieldsHigh correlation
PAY_AMT4 is highly correlated with PAY_AMT1 and 1 other fieldsHigh correlation
PAY_AMT5 is highly correlated with BILL_AMT3 and 1 other fieldsHigh correlation
default.payment.next.month is highly correlated with PAY_0High correlation
PAY_AMT2 is highly skewed (γ1 = 30.45381745) Skewed
PAY_0 has 14737 (49.1%) zeros Zeros
PAY_2 has 15730 (52.4%) zeros Zeros
PAY_3 has 15764 (52.5%) zeros Zeros
PAY_4 has 16455 (54.9%) zeros Zeros
PAY_5 has 16947 (56.5%) zeros Zeros
PAY_6 has 16286 (54.3%) zeros Zeros
BILL_AMT1 has 2008 (6.7%) zeros Zeros
BILL_AMT2 has 2506 (8.4%) zeros Zeros
BILL_AMT3 has 2870 (9.6%) zeros Zeros
BILL_AMT4 has 3195 (10.7%) zeros Zeros
BILL_AMT5 has 3506 (11.7%) zeros Zeros
BILL_AMT6 has 4020 (13.4%) zeros Zeros
PAY_AMT1 has 5249 (17.5%) zeros Zeros
PAY_AMT2 has 5396 (18.0%) zeros Zeros
PAY_AMT3 has 5968 (19.9%) zeros Zeros
PAY_AMT4 has 6408 (21.4%) zeros Zeros
PAY_AMT5 has 6703 (22.3%) zeros Zeros
PAY_AMT6 has 7173 (23.9%) zeros Zeros

Reproduction

Analysis started2022-04-05 18:15:38.639624
Analysis finished2022-04-05 18:17:41.268608
Duration2 minutes and 2.63 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

LIMIT_BAL
Real number (ℝ≥0)

HIGH CORRELATION

Distinct81
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167484.3227
Minimum10000
Maximum1000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129747.6616
Coefficient of variation (CV)0.7746854124
Kurtosis0.5362628964
Mean167484.3227
Median Absolute Deviation (MAD)90000
Skewness0.9928669605
Sum5024529680
Variance1.683445568 × 1010
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500003365
 
11.2%
200001976
 
6.6%
300001610
 
5.4%
800001567
 
5.2%
2000001528
 
5.1%
1500001110
 
3.7%
1000001048
 
3.5%
180000995
 
3.3%
360000881
 
2.9%
60000825
 
2.8%
Other values (71)15095
50.3%
ValueCountFrequency (%)
10000493
 
1.6%
160002
 
< 0.1%
200001976
6.6%
300001610
5.4%
40000230
 
0.8%
500003365
11.2%
60000825
 
2.8%
70000731
 
2.4%
800001567
5.2%
90000651
 
2.2%
ValueCountFrequency (%)
10000001
 
< 0.1%
8000002
 
< 0.1%
7800002
 
< 0.1%
7600001
 
< 0.1%
7500004
< 0.1%
7400002
 
< 0.1%
7300002
 
< 0.1%
7200003
 
< 0.1%
7100006
< 0.1%
7000008
< 0.1%

SEX
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
2
18112 
1
11888 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
218112
60.4%
111888
39.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
218112
60.4%
111888
39.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

EDUCATION
Real number (ℝ≥0)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.853133333
Minimum0
Maximum6
Zeros14
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q32
95-th percentile3
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7903486597
Coefficient of variation (CV)0.426493143
Kurtosis2.078621603
Mean1.853133333
Median Absolute Deviation (MAD)1
Skewness0.9709720486
Sum55594
Variance0.6246510039
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
214030
46.8%
110585
35.3%
34917
 
16.4%
5280
 
0.9%
4123
 
0.4%
651
 
0.2%
014
 
< 0.1%
ValueCountFrequency (%)
014
 
< 0.1%
110585
35.3%
214030
46.8%
34917
 
16.4%
4123
 
0.4%
5280
 
0.9%
651
 
0.2%
ValueCountFrequency (%)
651
 
0.2%
5280
 
0.9%
4123
 
0.4%
34917
 
16.4%
214030
46.8%
110585
35.3%
014
 
< 0.1%

MARRIAGE
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
2
15964 
1
13659 
3
 
323
0
 
54

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
215964
53.2%
113659
45.5%
3323
 
1.1%
054
 
0.2%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
215964
53.2%
113659
45.5%
3323
 
1.1%
054
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

AGE
Real number (ℝ≥0)

Distinct56
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.4855
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.217904068
Coefficient of variation (CV)0.2597653709
Kurtosis0.04430337824
Mean35.4855
Median Absolute Deviation (MAD)6
Skewness0.7322458688
Sum1064565
Variance84.96975541
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
291605
 
5.3%
271477
 
4.9%
281409
 
4.7%
301395
 
4.7%
261256
 
4.2%
311217
 
4.1%
251186
 
4.0%
341162
 
3.9%
321158
 
3.9%
331146
 
3.8%
Other values (46)16989
56.6%
ValueCountFrequency (%)
2167
 
0.2%
22560
 
1.9%
23931
3.1%
241127
3.8%
251186
4.0%
261256
4.2%
271477
4.9%
281409
4.7%
291605
5.3%
301395
4.7%
ValueCountFrequency (%)
791
 
< 0.1%
753
 
< 0.1%
741
 
< 0.1%
734
 
< 0.1%
723
 
< 0.1%
713
 
< 0.1%
7010
< 0.1%
6915
0.1%
685
 
< 0.1%
6716
0.1%

PAY_0
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.0167
Minimum-2
Maximum8
Zeros14737
Zeros (%)49.1%
Negative8445
Negative (%)28.1%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.123801528
Coefficient of variation (CV)-67.29350467
Kurtosis2.720715042
Mean-0.0167
Median Absolute Deviation (MAD)1
Skewness0.7319749269
Sum-501
Variance1.262929874
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
014737
49.1%
-15686
 
19.0%
13688
 
12.3%
-22759
 
9.2%
22667
 
8.9%
3322
 
1.1%
476
 
0.3%
526
 
0.1%
819
 
0.1%
611
 
< 0.1%
ValueCountFrequency (%)
-22759
 
9.2%
-15686
 
19.0%
014737
49.1%
13688
 
12.3%
22667
 
8.9%
3322
 
1.1%
476
 
0.3%
526
 
0.1%
611
 
< 0.1%
79
 
< 0.1%
ValueCountFrequency (%)
819
 
0.1%
79
 
< 0.1%
611
 
< 0.1%
526
 
0.1%
476
 
0.3%
3322
 
1.1%
22667
 
8.9%
13688
 
12.3%
014737
49.1%
-15686
 
19.0%

PAY_2
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1337666667
Minimum-2
Maximum8
Zeros15730
Zeros (%)52.4%
Negative9832
Negative (%)32.8%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.197185973
Coefficient of variation (CV)-8.949807922
Kurtosis1.57041773
Mean-0.1337666667
Median Absolute Deviation (MAD)0
Skewness0.7905650222
Sum-4013
Variance1.433254254
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
015730
52.4%
-16050
 
20.2%
23927
 
13.1%
-23782
 
12.6%
3326
 
1.1%
499
 
0.3%
128
 
0.1%
525
 
0.1%
720
 
0.1%
612
 
< 0.1%
ValueCountFrequency (%)
-23782
 
12.6%
-16050
 
20.2%
015730
52.4%
128
 
0.1%
23927
 
13.1%
3326
 
1.1%
499
 
0.3%
525
 
0.1%
612
 
< 0.1%
720
 
0.1%
ValueCountFrequency (%)
81
 
< 0.1%
720
 
0.1%
612
 
< 0.1%
525
 
0.1%
499
 
0.3%
3326
 
1.1%
23927
 
13.1%
128
 
0.1%
015730
52.4%
-16050
 
20.2%

PAY_3
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1662
Minimum-2
Maximum8
Zeros15764
Zeros (%)52.5%
Negative10023
Negative (%)33.4%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.196867568
Coefficient of variation (CV)-7.201369245
Kurtosis2.084435875
Mean-0.1662
Median Absolute Deviation (MAD)0
Skewness0.8406818269
Sum-4986
Variance1.432491976
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
015764
52.5%
-15938
 
19.8%
-24085
 
13.6%
23819
 
12.7%
3240
 
0.8%
476
 
0.3%
727
 
0.1%
623
 
0.1%
521
 
0.1%
14
 
< 0.1%
ValueCountFrequency (%)
-24085
 
13.6%
-15938
 
19.8%
015764
52.5%
14
 
< 0.1%
23819
 
12.7%
3240
 
0.8%
476
 
0.3%
521
 
0.1%
623
 
0.1%
727
 
0.1%
ValueCountFrequency (%)
83
 
< 0.1%
727
 
0.1%
623
 
0.1%
521
 
0.1%
476
 
0.3%
3240
 
0.8%
23819
 
12.7%
14
 
< 0.1%
015764
52.5%
-15938
 
19.8%

PAY_4
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2206666667
Minimum-2
Maximum8
Zeros16455
Zeros (%)54.9%
Negative10035
Negative (%)33.5%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.169138622
Coefficient of variation (CV)-5.29821128
Kurtosis3.496983496
Mean-0.2206666667
Median Absolute Deviation (MAD)0
Skewness0.9996294133
Sum-6620
Variance1.366885118
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
016455
54.9%
-15687
 
19.0%
-24348
 
14.5%
23159
 
10.5%
3180
 
0.6%
469
 
0.2%
758
 
0.2%
535
 
0.1%
65
 
< 0.1%
12
 
< 0.1%
ValueCountFrequency (%)
-24348
 
14.5%
-15687
 
19.0%
016455
54.9%
12
 
< 0.1%
23159
 
10.5%
3180
 
0.6%
469
 
0.2%
535
 
0.1%
65
 
< 0.1%
758
 
0.2%
ValueCountFrequency (%)
82
 
< 0.1%
758
 
0.2%
65
 
< 0.1%
535
 
0.1%
469
 
0.2%
3180
 
0.6%
23159
 
10.5%
12
 
< 0.1%
016455
54.9%
-15687
 
19.0%

PAY_5
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2662
Minimum-2
Maximum8
Zeros16947
Zeros (%)56.5%
Negative10085
Negative (%)33.6%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.133187406
Coefficient of variation (CV)-4.256902352
Kurtosis3.989748144
Mean-0.2662
Median Absolute Deviation (MAD)0
Skewness1.008197025
Sum-7986
Variance1.284113697
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
016947
56.5%
-15539
 
18.5%
-24546
 
15.2%
22626
 
8.8%
3178
 
0.6%
484
 
0.3%
758
 
0.2%
517
 
0.1%
64
 
< 0.1%
81
 
< 0.1%
ValueCountFrequency (%)
-24546
 
15.2%
-15539
 
18.5%
016947
56.5%
22626
 
8.8%
3178
 
0.6%
484
 
0.3%
517
 
0.1%
64
 
< 0.1%
758
 
0.2%
81
 
< 0.1%
ValueCountFrequency (%)
81
 
< 0.1%
758
 
0.2%
64
 
< 0.1%
517
 
0.1%
484
 
0.3%
3178
 
0.6%
22626
 
8.8%
016947
56.5%
-15539
 
18.5%
-24546
 
15.2%

PAY_6
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2911
Minimum-2
Maximum8
Zeros16286
Zeros (%)54.3%
Negative10635
Negative (%)35.4%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.149987626
Coefficient of variation (CV)-3.950489954
Kurtosis3.42653413
Mean-0.2911
Median Absolute Deviation (MAD)0
Skewness0.9480293916
Sum-8733
Variance1.322471539
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
016286
54.3%
-15740
 
19.1%
-24895
 
16.3%
22766
 
9.2%
3184
 
0.6%
449
 
0.2%
746
 
0.2%
619
 
0.1%
513
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
-24895
 
16.3%
-15740
 
19.1%
016286
54.3%
22766
 
9.2%
3184
 
0.6%
449
 
0.2%
513
 
< 0.1%
619
 
0.1%
746
 
0.2%
82
 
< 0.1%
ValueCountFrequency (%)
82
 
< 0.1%
746
 
0.2%
619
 
0.1%
513
 
< 0.1%
449
 
0.2%
3184
 
0.6%
22766
 
9.2%
016286
54.3%
-15740
 
19.1%
-24895
 
16.3%

BILL_AMT1
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct22723
Distinct (%)75.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51223.3309
Minimum-165580
Maximum964511
Zeros2008
Zeros (%)6.7%
Negative590
Negative (%)2.0%
Memory size234.5 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13558.75
median22381.5
Q367091
95-th percentile201203.05
Maximum964511
Range1130091
Interquartile range (IQR)63532.25

Descriptive statistics

Standard deviation73635.86058
Coefficient of variation (CV)1.437545339
Kurtosis9.806289341
Mean51223.3309
Median Absolute Deviation (MAD)21800.5
Skewness2.663861022
Sum1536699927
Variance5422239963
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02008
 
6.7%
390244
 
0.8%
78076
 
0.3%
32672
 
0.2%
31663
 
0.2%
250059
 
0.2%
39649
 
0.2%
240039
 
0.1%
41629
 
0.1%
50025
 
0.1%
Other values (22713)27336
91.1%
ValueCountFrequency (%)
-1655801
< 0.1%
-1549731
< 0.1%
-153081
< 0.1%
-143861
< 0.1%
-115451
< 0.1%
-106821
< 0.1%
-98021
< 0.1%
-90951
< 0.1%
-81871
< 0.1%
-74381
< 0.1%
ValueCountFrequency (%)
9645111
< 0.1%
7468141
< 0.1%
6530621
< 0.1%
6304581
< 0.1%
6266481
< 0.1%
6217491
< 0.1%
6138601
< 0.1%
6107231
< 0.1%
6085941
< 0.1%
6040191
< 0.1%

BILL_AMT2
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct22346
Distinct (%)74.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49179.07517
Minimum-69777
Maximum983931
Zeros2506
Zeros (%)8.4%
Negative669
Negative (%)2.2%
Memory size234.5 KiB

Quantile statistics

Minimum-69777
5-th percentile0
Q12984.75
median21200
Q364006.25
95-th percentile194792.2
Maximum983931
Range1053708
Interquartile range (IQR)61021.5

Descriptive statistics

Standard deviation71173.76878
Coefficient of variation (CV)1.447236829
Kurtosis10.30294592
Mean49179.07517
Median Absolute Deviation (MAD)20810
Skewness2.705220853
Sum1475372255
Variance5065705363
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02506
 
8.4%
390231
 
0.8%
32675
 
0.2%
78075
 
0.2%
31672
 
0.2%
39651
 
0.2%
250051
 
0.2%
240042
 
0.1%
-20029
 
0.1%
41628
 
0.1%
Other values (22336)26840
89.5%
ValueCountFrequency (%)
-697771
< 0.1%
-675261
< 0.1%
-333501
< 0.1%
-300001
< 0.1%
-262141
< 0.1%
-247041
< 0.1%
-247021
< 0.1%
-229601
< 0.1%
-186181
< 0.1%
-180881
< 0.1%
ValueCountFrequency (%)
9839311
< 0.1%
7439701
< 0.1%
6715631
< 0.1%
6467701
< 0.1%
6244751
< 0.1%
6059431
< 0.1%
5977931
< 0.1%
5868251
< 0.1%
5817751
< 0.1%
5776811
< 0.1%

BILL_AMT3
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct22026
Distinct (%)73.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47013.1548
Minimum-157264
Maximum1664089
Zeros2870
Zeros (%)9.6%
Negative655
Negative (%)2.2%
Memory size234.5 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12666.25
median20088.5
Q360164.75
95-th percentile187821.05
Maximum1664089
Range1821353
Interquartile range (IQR)57498.5

Descriptive statistics

Standard deviation69349.38743
Coefficient of variation (CV)1.475106015
Kurtosis19.78325514
Mean47013.1548
Median Absolute Deviation (MAD)19708.5
Skewness3.087830046
Sum1410394644
Variance4809337537
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02870
 
9.6%
390275
 
0.9%
78074
 
0.2%
32663
 
0.2%
31662
 
0.2%
39648
 
0.2%
250040
 
0.1%
240039
 
0.1%
41629
 
0.1%
20027
 
0.1%
Other values (22016)26473
88.2%
ValueCountFrequency (%)
-1572641
< 0.1%
-615061
< 0.1%
-461271
< 0.1%
-340411
< 0.1%
-254431
< 0.1%
-247021
< 0.1%
-203201
< 0.1%
-177061
< 0.1%
-159101
< 0.1%
-156411
< 0.1%
ValueCountFrequency (%)
16640891
< 0.1%
8550861
< 0.1%
6931311
< 0.1%
6896431
< 0.1%
6896271
< 0.1%
6320411
< 0.1%
5974151
< 0.1%
5789711
< 0.1%
5779571
< 0.1%
5770151
< 0.1%

BILL_AMT4
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct21548
Distinct (%)71.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43262.94897
Minimum-170000
Maximum891586
Zeros3195
Zeros (%)10.7%
Negative675
Negative (%)2.2%
Memory size234.5 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12326.75
median19052
Q354506
95-th percentile174333.35
Maximum891586
Range1061586
Interquartile range (IQR)52179.25

Descriptive statistics

Standard deviation64332.85613
Coefficient of variation (CV)1.487019671
Kurtosis11.30932483
Mean43262.94897
Median Absolute Deviation (MAD)18656
Skewness2.821965291
Sum1297888469
Variance4138716378
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03195
 
10.7%
390246
 
0.8%
780101
 
0.3%
31668
 
0.2%
32662
 
0.2%
39644
 
0.1%
240039
 
0.1%
15039
 
0.1%
250034
 
0.1%
41633
 
0.1%
Other values (21538)26139
87.1%
ValueCountFrequency (%)
-1700001
< 0.1%
-813341
< 0.1%
-651671
< 0.1%
-506161
< 0.1%
-466271
< 0.1%
-345031
< 0.1%
-274901
< 0.1%
-243031
< 0.1%
-221081
< 0.1%
-203201
< 0.1%
ValueCountFrequency (%)
8915861
< 0.1%
7068641
< 0.1%
6286991
< 0.1%
6168361
< 0.1%
5728051
< 0.1%
5690341
< 0.1%
5656691
< 0.1%
5635431
< 0.1%
5480201
< 0.1%
5426531
< 0.1%

BILL_AMT5
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct21010
Distinct (%)70.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40311.40097
Minimum-81334
Maximum927171
Zeros3506
Zeros (%)11.7%
Negative655
Negative (%)2.2%
Memory size234.5 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11763
median18104.5
Q350190.5
95-th percentile165794.3
Maximum927171
Range1008505
Interquartile range (IQR)48427.5

Descriptive statistics

Standard deviation60797.15577
Coefficient of variation (CV)1.508187617
Kurtosis12.30588129
Mean40311.40097
Median Absolute Deviation (MAD)17688.5
Skewness2.876379867
Sum1209342029
Variance3696294150
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03506
 
11.7%
390235
 
0.8%
78094
 
0.3%
31679
 
0.3%
32662
 
0.2%
15058
 
0.2%
39647
 
0.2%
240039
 
0.1%
250037
 
0.1%
41636
 
0.1%
Other values (21000)25807
86.0%
ValueCountFrequency (%)
-813341
< 0.1%
-613721
< 0.1%
-530071
< 0.1%
-466271
< 0.1%
-375941
< 0.1%
-361561
< 0.1%
-304811
< 0.1%
-283351
< 0.1%
-230031
< 0.1%
-207531
< 0.1%
ValueCountFrequency (%)
9271711
< 0.1%
8235401
< 0.1%
5870671
< 0.1%
5517021
< 0.1%
5478801
< 0.1%
5306721
< 0.1%
5243151
< 0.1%
5161391
< 0.1%
5141141
< 0.1%
5082131
< 0.1%

BILL_AMT6
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct20604
Distinct (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38871.7604
Minimum-339603
Maximum961664
Zeros4020
Zeros (%)13.4%
Negative688
Negative (%)2.3%
Memory size234.5 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11256
median17071
Q349198.25
95-th percentile161912
Maximum961664
Range1301267
Interquartile range (IQR)47942.25

Descriptive statistics

Standard deviation59554.10754
Coefficient of variation (CV)1.53206613
Kurtosis12.27070529
Mean38871.7604
Median Absolute Deviation (MAD)16755
Skewness2.846644576
Sum1166152812
Variance3546691724
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04020
 
13.4%
390207
 
0.7%
78086
 
0.3%
15078
 
0.3%
31677
 
0.3%
32656
 
0.2%
39645
 
0.1%
41636
 
0.1%
-1833
 
0.1%
240032
 
0.1%
Other values (20594)25330
84.4%
ValueCountFrequency (%)
-3396031
< 0.1%
-2090511
< 0.1%
-1509531
< 0.1%
-946251
< 0.1%
-738951
< 0.1%
-570601
< 0.1%
-514431
< 0.1%
-511831
< 0.1%
-466271
< 0.1%
-457341
< 0.1%
ValueCountFrequency (%)
9616641
< 0.1%
6999441
< 0.1%
5686381
< 0.1%
5277111
< 0.1%
5275661
< 0.1%
5149751
< 0.1%
5137981
< 0.1%
5119051
< 0.1%
5013701
< 0.1%
4991001
< 0.1%

PAY_AMT1
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7943
Distinct (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5663.5805
Minimum0
Maximum873552
Zeros5249
Zeros (%)17.5%
Negative0
Negative (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2100
Q35006
95-th percentile18428.2
Maximum873552
Range873552
Interquartile range (IQR)4006

Descriptive statistics

Standard deviation16563.28035
Coefficient of variation (CV)2.924524575
Kurtosis415.2547427
Mean5663.5805
Median Absolute Deviation (MAD)1932
Skewness14.66836433
Sum169907415
Variance274342256.1
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05249
 
17.5%
20001363
 
4.5%
3000891
 
3.0%
5000698
 
2.3%
1500507
 
1.7%
4000426
 
1.4%
10000401
 
1.3%
1000365
 
1.2%
2500298
 
1.0%
6000294
 
1.0%
Other values (7933)19508
65.0%
ValueCountFrequency (%)
05249
17.5%
19
 
< 0.1%
214
 
< 0.1%
315
 
0.1%
418
 
0.1%
512
 
< 0.1%
615
 
0.1%
79
 
< 0.1%
88
 
< 0.1%
97
 
< 0.1%
ValueCountFrequency (%)
8735521
< 0.1%
5050001
< 0.1%
4933581
< 0.1%
4239031
< 0.1%
4050161
< 0.1%
3681991
< 0.1%
3230141
< 0.1%
3048151
< 0.1%
3020001
< 0.1%
3000391
< 0.1%

PAY_AMT2
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct7899
Distinct (%)26.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5921.1635
Minimum0
Maximum1684259
Zeros5396
Zeros (%)18.0%
Negative0
Negative (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1833
median2009
Q35000
95-th percentile19004.35
Maximum1684259
Range1684259
Interquartile range (IQR)4167

Descriptive statistics

Standard deviation23040.8704
Coefficient of variation (CV)3.891274139
Kurtosis1641.631911
Mean5921.1635
Median Absolute Deviation (MAD)1991
Skewness30.45381745
Sum177634905
Variance530881708.9
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05396
 
18.0%
20001290
 
4.3%
3000857
 
2.9%
5000717
 
2.4%
1000594
 
2.0%
1500521
 
1.7%
4000410
 
1.4%
10000318
 
1.1%
6000283
 
0.9%
2500251
 
0.8%
Other values (7889)19363
64.5%
ValueCountFrequency (%)
05396
18.0%
115
 
0.1%
220
 
0.1%
318
 
0.1%
411
 
< 0.1%
525
 
0.1%
68
 
< 0.1%
712
 
< 0.1%
89
 
< 0.1%
96
 
< 0.1%
ValueCountFrequency (%)
16842591
< 0.1%
12270821
< 0.1%
12154711
< 0.1%
10245161
< 0.1%
5804641
< 0.1%
4155521
< 0.1%
4010031
< 0.1%
3881261
< 0.1%
3852281
< 0.1%
3849861
< 0.1%

PAY_AMT3
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7518
Distinct (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5225.6815
Minimum0
Maximum896040
Zeros5968
Zeros (%)19.9%
Negative0
Negative (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1390
median1800
Q34505
95-th percentile17589.4
Maximum896040
Range896040
Interquartile range (IQR)4115

Descriptive statistics

Standard deviation17606.96147
Coefficient of variation (CV)3.36931393
Kurtosis564.3112295
Mean5225.6815
Median Absolute Deviation (MAD)1795
Skewness17.21663544
Sum156770445
Variance310005092.2
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05968
 
19.9%
20001285
 
4.3%
10001103
 
3.7%
3000870
 
2.9%
5000721
 
2.4%
1500490
 
1.6%
4000381
 
1.3%
10000312
 
1.0%
1200243
 
0.8%
6000241
 
0.8%
Other values (7508)18386
61.3%
ValueCountFrequency (%)
05968
19.9%
113
 
< 0.1%
219
 
0.1%
314
 
< 0.1%
415
 
0.1%
518
 
0.1%
614
 
< 0.1%
718
 
0.1%
810
 
< 0.1%
912
 
< 0.1%
ValueCountFrequency (%)
8960401
< 0.1%
8890431
< 0.1%
5082291
< 0.1%
4175881
< 0.1%
4009721
< 0.1%
3970921
< 0.1%
3804781
< 0.1%
3717181
< 0.1%
3493951
< 0.1%
3442611
< 0.1%

PAY_AMT4
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct6937
Distinct (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4826.076867
Minimum0
Maximum621000
Zeros6408
Zeros (%)21.4%
Negative0
Negative (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1296
median1500
Q34013.25
95-th percentile16014.95
Maximum621000
Range621000
Interquartile range (IQR)3717.25

Descriptive statistics

Standard deviation15666.15974
Coefficient of variation (CV)3.246147995
Kurtosis277.3337677
Mean4826.076867
Median Absolute Deviation (MAD)1500
Skewness12.90498482
Sum144782306
Variance245428561.1
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06408
 
21.4%
10001394
 
4.6%
20001214
 
4.0%
3000887
 
3.0%
5000810
 
2.7%
1500441
 
1.5%
4000402
 
1.3%
10000341
 
1.1%
2500259
 
0.9%
500258
 
0.9%
Other values (6927)17586
58.6%
ValueCountFrequency (%)
06408
21.4%
122
 
0.1%
222
 
0.1%
313
 
< 0.1%
420
 
0.1%
512
 
< 0.1%
616
 
0.1%
711
 
< 0.1%
87
 
< 0.1%
99
 
< 0.1%
ValueCountFrequency (%)
6210001
< 0.1%
5288971
< 0.1%
4970001
< 0.1%
4321301
< 0.1%
4000461
< 0.1%
3317881
< 0.1%
3309821
< 0.1%
3200081
< 0.1%
3130941
< 0.1%
2929621
< 0.1%

PAY_AMT5
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct6897
Distinct (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4799.387633
Minimum0
Maximum426529
Zeros6703
Zeros (%)22.3%
Negative0
Negative (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1252.5
median1500
Q34031.5
95-th percentile16000
Maximum426529
Range426529
Interquartile range (IQR)3779

Descriptive statistics

Standard deviation15278.30568
Coefficient of variation (CV)3.183386475
Kurtosis180.0639402
Mean4799.387633
Median Absolute Deviation (MAD)1500
Skewness11.12741705
Sum143981629
Variance233426624.4
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06703
 
22.3%
10001340
 
4.5%
20001323
 
4.4%
3000947
 
3.2%
5000814
 
2.7%
1500426
 
1.4%
4000401
 
1.3%
10000343
 
1.1%
500250
 
0.8%
6000247
 
0.8%
Other values (6887)17206
57.4%
ValueCountFrequency (%)
06703
22.3%
121
 
0.1%
213
 
< 0.1%
313
 
< 0.1%
412
 
< 0.1%
59
 
< 0.1%
67
 
< 0.1%
79
 
< 0.1%
86
 
< 0.1%
96
 
< 0.1%
ValueCountFrequency (%)
4265291
< 0.1%
4179901
< 0.1%
3880711
< 0.1%
3792671
< 0.1%
3320001
< 0.1%
3317881
< 0.1%
3309821
< 0.1%
3268891
< 0.1%
3170771
< 0.1%
3101351
< 0.1%

PAY_AMT6
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6939
Distinct (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5215.502567
Minimum0
Maximum528666
Zeros7173
Zeros (%)23.9%
Negative0
Negative (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1117.75
median1500
Q34000
95-th percentile17343.8
Maximum528666
Range528666
Interquartile range (IQR)3882.25

Descriptive statistics

Standard deviation17777.46578
Coefficient of variation (CV)3.408581541
Kurtosis167.1614296
Mean5215.502567
Median Absolute Deviation (MAD)1500
Skewness10.64072733
Sum156465077
Variance316038289.4
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
07173
23.9%
10001299
 
4.3%
20001295
 
4.3%
3000914
 
3.0%
5000808
 
2.7%
1500439
 
1.5%
4000411
 
1.4%
10000356
 
1.2%
500247
 
0.8%
6000220
 
0.7%
Other values (6929)16838
56.1%
ValueCountFrequency (%)
07173
23.9%
120
 
0.1%
29
 
< 0.1%
314
 
< 0.1%
412
 
< 0.1%
57
 
< 0.1%
66
 
< 0.1%
75
 
< 0.1%
86
 
< 0.1%
97
 
< 0.1%
ValueCountFrequency (%)
5286661
< 0.1%
5271431
< 0.1%
4430011
< 0.1%
4220001
< 0.1%
4035001
< 0.1%
3770001
< 0.1%
3724951
< 0.1%
3512821
< 0.1%
3452931
< 0.1%
3080001
< 0.1%

default.payment.next.month
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
0
23364 
1
6636 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
023364
77.9%
16636
 
22.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
023364
77.9%
16636
 
22.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

LIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default.payment.next.month
020000.02212422-1-1-2-23913.03102.0689.00.00.00.00.0689.00.00.00.00.01
1120000.022226-1200022682.01725.02682.03272.03455.03261.00.01000.01000.01000.00.02000.01
290000.02223400000029239.014027.013559.014331.014948.015549.01518.01500.01000.01000.01000.05000.00
350000.02213700000046990.048233.049291.028314.028959.029547.02000.02019.01200.01100.01069.01000.00
450000.012157-10-10008617.05670.035835.020940.019146.019131.02000.036681.010000.09000.0689.0679.00
550000.01123700000064400.057069.057608.019394.019619.020024.02500.01815.0657.01000.01000.0800.00
6500000.011229000000367965.0412023.0445007.0542653.0483003.0473944.055000.040000.038000.020239.013750.013770.00
7100000.0222230-1-100-111876.0380.0601.0221.0-159.0567.0380.0601.00.0581.01687.01542.00
8140000.02312800200011285.014096.012108.012211.011793.03719.03329.00.0432.01000.01000.01000.00
920000.013235-2-2-2-2-1-10.00.00.00.013007.013912.00.00.00.013007.01122.00.00

Last rows

LIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default.payment.next.month
29990140000.012141000000138325.0137142.0139110.0138262.049675.046121.06000.07000.04228.01505.02000.02000.00
29991210000.0121343222222500.02500.02500.02500.02500.02500.00.00.00.00.00.00.01
2999210000.013143000-2-2-28802.010400.00.00.00.00.02000.00.00.00.00.00.00
29993100000.0112380-1-10003042.01427.0102996.070626.069473.055004.02000.0111784.04000.03000.02000.02000.00
2999480000.01223422222272557.077708.079384.077519.082607.081158.07000.03500.00.07000.00.04000.01
29995220000.013139000000188948.0192815.0208365.088004.031237.015980.08500.020000.05003.03047.05000.01000.00
29996150000.013243-1-1-1-1001683.01828.03502.08979.05190.00.01837.03526.08998.0129.00.00.00
2999730000.012237432-1003565.03356.02758.020878.020582.019357.00.00.022000.04200.02000.03100.01
2999880000.0131411-1000-1-1645.078379.076304.052774.011855.048944.085900.03409.01178.01926.052964.01804.01
2999950000.01214600000047929.048905.049764.036535.032428.015313.02078.01800.01430.01000.01000.01000.01

Duplicate rows

Most frequently occurring

LIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default.payment.next.month# duplicates
020000.0122242244441650.01650.01650.01650.01650.01650.00.00.00.00.00.00.012
150000.0122261-2-2-2-2-20.00.00.00.00.00.00.00.00.00.00.00.002
250000.0212231-2-2-2-2-20.00.00.00.00.00.00.00.00.00.00.00.002
380000.022131-2-2-2-2-2-20.00.00.00.00.00.00.00.00.00.00.00.002
480000.022225-2-2-2-2-2-20.00.00.00.00.00.00.00.00.00.00.00.002
580000.023142-2-2-2-2-2-20.00.00.00.00.00.00.00.00.00.00.00.002
690000.0212311-2-2-2-2-20.00.00.00.00.00.00.00.00.00.00.00.002
7100000.0221491-2-2-2-2-20.00.00.00.00.00.00.00.00.00.00.00.002
8110000.0212311-2-2-2-2-20.00.00.00.00.00.00.00.00.00.00.00.002
9140000.0112291-2-2-2-2-20.00.00.00.00.00.00.00.00.00.00.00.002